Dataset statistics
| Number of variables | 32 |
|---|---|
| Number of observations | 1296675 |
| Missing cells | 1318 |
| Missing cells (%) | < 0.1% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 294.3 MiB |
| Average record size in memory | 238.0 B |
Variable types
| DateTime | 1 |
|---|---|
| Numeric | 16 |
| Categorical | 13 |
| Unsupported | 2 |
merchant has a high cardinality: 693 distinct values | High cardinality |
street has a high cardinality: 983 distinct values | High cardinality |
city has a high cardinality: 894 distinct values | High cardinality |
state has a high cardinality: 51 distinct values | High cardinality |
job has a high cardinality: 494 distinct values | High cardinality |
trans_num has a high cardinality: 1296675 distinct values | High cardinality |
name has a high cardinality: 973 distinct values | High cardinality |
lat is highly correlated with merch_lat | High correlation |
long is highly correlated with merch_long | High correlation |
merch_lat is highly correlated with lat | High correlation |
merch_long is highly correlated with long | High correlation |
trans_month is highly correlated with trans_week | High correlation |
trans_week is highly correlated with trans_month | High correlation |
lat is highly correlated with merch_lat | High correlation |
long is highly correlated with merch_long | High correlation |
merch_lat is highly correlated with lat | High correlation |
merch_long is highly correlated with long | High correlation |
trans_month is highly correlated with trans_week | High correlation |
trans_week is highly correlated with trans_month | High correlation |
lat is highly correlated with merch_lat | High correlation |
long is highly correlated with merch_long | High correlation |
merch_lat is highly correlated with lat | High correlation |
merch_long is highly correlated with long | High correlation |
trans_month is highly correlated with trans_week | High correlation |
trans_week is highly correlated with trans_month | High correlation |
category is highly correlated with trans_hour and 1 other fields | High correlation |
state is highly correlated with zip and 7 other fields | High correlation |
zip is highly correlated with state and 4 other fields | High correlation |
lat is highly correlated with state and 4 other fields | High correlation |
long is highly correlated with state and 4 other fields | High correlation |
city_pop is highly correlated with state | High correlation |
merch_lat is highly correlated with state and 4 other fields | High correlation |
merch_long is highly correlated with state and 4 other fields | High correlation |
trans_year is highly correlated with trans_month and 1 other fields | High correlation |
trans_month is highly correlated with trans_year and 1 other fields | High correlation |
trans_week is highly correlated with trans_year and 1 other fields | High correlation |
trans_hour is highly correlated with category | High correlation |
age is highly correlated with state and 1 other fields | High correlation |
amt_group is highly correlated with category | High correlation |
age_group is highly correlated with state and 1 other fields | High correlation |
amt is highly skewed (γ1 = 42.27787379) | Skewed |
trans_num is uniformly distributed | Uniform |
trans_num has unique values | Unique |
distance has unique values | Unique |
coords_ori is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
coords_merch is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
trans_hour has 42502 (3.3%) zeros | Zeros |
trans_minute has 21372 (1.6%) zeros | Zeros |
trans_dayofweek has 254282 (19.6%) zeros | Zeros |
Reproduction
| Analysis started | 2022-03-02 14:44:43.186655 |
|---|---|
| Analysis finished | 2022-03-02 14:48:23.428924 |
| Duration | 3 minutes and 40.24 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
trans_datetime
Date
| Distinct | 1274791 |
|---|---|
| Distinct (%) | 98.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.9 MiB |
| Minimum | 2019-01-01 00:00:18 |
|---|---|
| Maximum | 2020-06-21 12:13:37 |
cc_num
Real number (ℝ≥0)
| Distinct | 983 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.171920421 × 1017 |
| Minimum | 6.041620718 × 1010 |
|---|---|
| Maximum | 4.992346398 × 1018 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.9 MiB |
Quantile statistics
| Minimum | 6.041620718 × 1010 |
|---|---|
| 5-th percentile | 6.304848798 × 1011 |
| Q1 | 1.800429465 × 1014 |
| median | 3.521417321 × 1015 |
| Q3 | 4.642255475 × 1015 |
| 95-th percentile | 4.497913966 × 1018 |
| Maximum | 4.992346398 × 1018 |
| Range | 4.992346338 × 1018 |
| Interquartile range (IQR) | 4.462212529 × 1015 |
Descriptive statistics
| Standard deviation | 1.308806447 × 1018 |
|---|---|
| Coefficient of variation (CV) | 3.1371798 |
| Kurtosis | 6.179949935 |
| Mean | 4.171920421 × 1017 |
| Median Absolute Deviation (MAD) | 3.076470873 × 1015 |
| Skewness | 2.851879006 |
| Sum | -6.725541877 × 1018 |
| Variance | 1.712974316 × 1036 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 5.713652351 × 1011 | 3123 | 0.2% |
| 4.512828415 × 1018 | 3123 | 0.2% |
| 3.672269902 × 1013 | 3119 | 0.2% |
| 2.131124026 × 1014 | 3117 | 0.2% |
| 3.54510934 × 1015 | 3113 | 0.2% |
| 6.534628261 × 1015 | 3112 | 0.2% |
| 6.011367958 × 1015 | 3110 | 0.2% |
| 2.720433096 × 1015 | 3107 | 0.2% |
| 6.011438889 × 1015 | 3106 | 0.2% |
| 6.011109737 × 1015 | 3101 | 0.2% |
| Other values (973) | 1265544 |
| Value | Count | Frequency (%) |
| 6.041620718 × 1010 | 1518 | |
| 6.042292873 × 1010 | 1531 | |
| 6.042309813 × 1010 | 510 | < 0.1% |
| 6.042785159 × 1010 | 528 | < 0.1% |
| 6.048700208 × 1010 | 496 | < 0.1% |
| 6.04905963 × 1010 | 1010 | |
| 6.049559311 × 1010 | 518 | < 0.1% |
| 5.018029536 × 1011 | 1559 | |
| 5.018181333 × 1011 | 8 | < 0.1% |
| 5.018282048 × 1011 | 515 | < 0.1% |
| Value | Count | Frequency (%) |
| 4.992346398 × 1018 | 2059 | |
| 4.989847571 × 1018 | 1007 | 0.1% |
| 4.980323468 × 1018 | 532 | < 0.1% |
| 4.973530368 × 1018 | 1040 | |
| 4.958589672 × 1018 | 1476 | |
| 4.95682899 × 1018 | 2566 | |
| 4.911818931 × 1018 | 9 | < 0.1% |
| 4.906628656 × 1018 | 2584 | |
| 4.897067971 × 1018 | 1038 | |
| 4.890424427 × 1018 | 1496 |
| Distinct | 693 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.9 MiB |
| fraud_Kilback_LLC | 4403 |
|---|---|
| fraud_Cormier_LLC | 3649 |
| fraud_Schumm_PLC | 3634 |
| fraud_Kuhn_LLC | 3510 |
| fraud_Boyer_PLC | 3493 |
| Other values (688) |
Length
| Max length | 43 |
|---|---|
| Median length | 20 |
| Mean length | 23.13259683 |
| Min length | 13 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | fraud_Rippin,_Kub_and_Mann |
|---|---|
| 2nd row | fraud_Heller,_Gutmann_and_Zieme |
| 3rd row | fraud_Lind-Buckridge |
| 4th row | fraud_Kutch,_Hermiston_and_Farrell |
| 5th row | fraud_Keeling-Crist |
Common Values
| Value | Count | Frequency (%) |
| fraud_Kilback_LLC | 4403 | 0.3% |
| fraud_Cormier_LLC | 3649 | 0.3% |
| fraud_Schumm_PLC | 3634 | 0.3% |
| fraud_Kuhn_LLC | 3510 | 0.3% |
| fraud_Boyer_PLC | 3493 | 0.3% |
| fraud_Dickinson_Ltd | 3434 | 0.3% |
| fraud_Cummerata-Jones | 2736 | 0.2% |
| fraud_Kutch_LLC | 2734 | 0.2% |
| fraud_Olson,_Becker_and_Koch | 2723 | 0.2% |
| fraud_Stroman,_Hudson_and_Erdman | 2721 | 0.2% |
| Other values (683) | 1263638 |
Length
| Value | Count | Frequency (%) |
| fraud_kilback_llc | 4403 | 0.3% |
| fraud_cormier_llc | 3649 | 0.3% |
| fraud_schumm_plc | 3634 | 0.3% |
| fraud_kuhn_llc | 3510 | 0.3% |
| fraud_boyer_plc | 3493 | 0.3% |
| fraud_dickinson_ltd | 3434 | 0.3% |
| fraud_cummerata-jones | 2736 | 0.2% |
| fraud_kutch_llc | 2734 | 0.2% |
| fraud_olson,_becker_and_koch | 2723 | 0.2% |
| fraud_stroman,_hudson_and_erdman | 2721 | 0.2% |
| Other values (683) | 1263638 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 14 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.9 MiB |
| gas_transport | |
|---|---|
| grocery_pos | |
| home | |
| shopping_pos | |
| kids_pets | |
| Other values (9) |
Length
| Max length | 14 |
|---|---|
| Median length | 11 |
| Mean length | 10.52607862 |
| Min length | 4 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | misc_net |
|---|---|
| 2nd row | grocery_pos |
| 3rd row | entertainment |
| 4th row | gas_transport |
| 5th row | misc_pos |
Common Values
| Value | Count | Frequency (%) |
| gas_transport | 131659 | |
| grocery_pos | 123638 | |
| home | 123115 | |
| shopping_pos | 116672 | |
| kids_pets | 113035 | |
| shopping_net | 97543 | |
| entertainment | 94014 | |
| food_dining | 91461 | 7.1% |
| personal_care | 90758 | 7.0% |
| health_fitness | 85879 | 6.6% |
| Other values (4) | 228901 |
Length
| Value | Count | Frequency (%) |
| gas_transport | 131659 | |
| grocery_pos | 123638 | |
| home | 123115 | |
| shopping_pos | 116672 | |
| kids_pets | 113035 | |
| shopping_net | 97543 | |
| entertainment | 94014 | |
| food_dining | 91461 | 7.1% |
| personal_care | 90758 | 7.0% |
| health_fitness | 85879 | 6.6% |
| Other values (4) | 228901 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 52928 |
|---|---|
| Distinct (%) | 4.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 70.35103546 |
| Minimum | 1 |
|---|---|
| Maximum | 28948.9 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.9 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2.44 |
| Q1 | 9.65 |
| median | 47.52 |
| Q3 | 83.14 |
| 95-th percentile | 196.31 |
| Maximum | 28948.9 |
| Range | 28947.9 |
| Interquartile range (IQR) | 73.49 |
Descriptive statistics
| Standard deviation | 160.3160386 |
|---|---|
| Coefficient of variation (CV) | 2.278801407 |
| Kurtosis | 4545.644979 |
| Mean | 70.35103546 |
| Median Absolute Deviation (MAD) | 37.5 |
| Skewness | 42.27787379 |
| Sum | 91222428.9 |
| Variance | 25701.23222 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1.14 | 542 | < 0.1% |
| 1.04 | 538 | < 0.1% |
| 1.25 | 535 | < 0.1% |
| 1.02 | 533 | < 0.1% |
| 1.01 | 523 | < 0.1% |
| 1.05 | 519 | < 0.1% |
| 1.2 | 516 | < 0.1% |
| 1.23 | 515 | < 0.1% |
| 1.08 | 512 | < 0.1% |
| 1.11 | 509 | < 0.1% |
| Other values (52918) | 1291433 |
| Value | Count | Frequency (%) |
| 1 | 222 | |
| 1.01 | 523 | |
| 1.02 | 533 | |
| 1.03 | 499 | |
| 1.04 | 538 | |
| 1.05 | 519 | |
| 1.06 | 471 | |
| 1.07 | 498 | |
| 1.08 | 512 | |
| 1.09 | 496 |
| Value | Count | Frequency (%) |
| 28948.9 | 1 | |
| 27390.12 | 1 | |
| 27119.77 | 1 | |
| 26544.12 | 1 | |
| 25086.94 | 1 | |
| 17897.24 | 1 | |
| 15305.95 | 1 | |
| 15047.03 | 1 | |
| 15034.18 | 1 | |
| 14849.74 | 1 |
gender
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.9 MiB |
| F | |
|---|---|
| M |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | F |
|---|---|
| 2nd row | F |
| 3rd row | M |
| 4th row | M |
| 5th row | M |
Common Values
| Value | Count | Frequency (%) |
| F | 709863 | |
| M | 586812 |
Length
Pie chart
| Value | Count | Frequency (%) |
| f | 709863 | |
| m | 586812 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 983 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.9 MiB |
| 0069 Robin Brooks Apt. 695 | 3123 |
|---|---|
| 864 Reynolds Plains | 3123 |
| 8172 Robertson Parkways Suite 072 | 3119 |
| 4664 Sanchez Common Suite 930 | 3117 |
| 8030 Beck Motorway | 3113 |
| Other values (978) |
Length
| Max length | 35 |
|---|---|
| Median length | 22 |
| Mean length | 22.22902655 |
| Min length | 12 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 561 Perry Cove |
|---|---|
| 2nd row | 43039 Riley Greens Suite 393 |
| 3rd row | 594 White Dale Suite 530 |
| 4th row | 9443 Cynthia Court Apt. 038 |
| 5th row | 408 Bradley Rest |
Common Values
| Value | Count | Frequency (%) |
| 0069 Robin Brooks Apt. 695 | 3123 | 0.2% |
| 864 Reynolds Plains | 3123 | 0.2% |
| 8172 Robertson Parkways Suite 072 | 3119 | 0.2% |
| 4664 Sanchez Common Suite 930 | 3117 | 0.2% |
| 8030 Beck Motorway | 3113 | 0.2% |
| 29606 Martinez Views Suite 653 | 3112 | 0.2% |
| 1652 James Mews | 3110 | 0.2% |
| 854 Walker Dale Suite 488 | 3107 | 0.2% |
| 40624 Rebecca Spurs | 3106 | 0.2% |
| 594 Berry Lights Apt. 392 | 3101 | 0.2% |
| Other values (973) | 1265544 |
Length
| Value | Count | Frequency (%) |
| apt | 327791 | 6.4% |
| suite | 305467 | 5.9% |
| island | 22954 | 0.4% |
| michael | 18967 | 0.4% |
| common | 17978 | 0.3% |
| station | 17957 | 0.3% |
| islands | 17917 | 0.3% |
| david | 17476 | 0.3% |
| brooks | 16991 | 0.3% |
| fields | 16321 | 0.3% |
| Other values (1940) | 4376722 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 894 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.9 MiB |
| Birmingham | 5617 |
|---|---|
| San_Antonio | 5130 |
| Utica | 5105 |
| Phoenix | 5075 |
| Meridian | 5060 |
| Other values (889) |
Length
| Max length | 25 |
|---|---|
| Median length | 8 |
| Mean length | 8.652245937 |
| Min length | 3 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Moravian_Falls |
|---|---|
| 2nd row | Orient |
| 3rd row | Malad_City |
| 4th row | Boulder |
| 5th row | Doe_Hill |
Common Values
| Value | Count | Frequency (%) |
| Birmingham | 5617 | 0.4% |
| San_Antonio | 5130 | 0.4% |
| Utica | 5105 | 0.4% |
| Phoenix | 5075 | 0.4% |
| Meridian | 5060 | 0.4% |
| Thomas | 4634 | 0.4% |
| Conway | 4613 | 0.4% |
| Cleveland | 4604 | 0.4% |
| Warren | 4599 | 0.4% |
| Houston | 4168 | 0.3% |
| Other values (884) | 1248070 |
Length
| Value | Count | Frequency (%) |
| birmingham | 5617 | 0.4% |
| san_antonio | 5130 | 0.4% |
| utica | 5105 | 0.4% |
| phoenix | 5075 | 0.4% |
| meridian | 5060 | 0.4% |
| thomas | 4634 | 0.4% |
| conway | 4613 | 0.4% |
| cleveland | 4604 | 0.4% |
| warren | 4599 | 0.4% |
| houston | 4168 | 0.3% |
| Other values (884) | 1248070 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 51 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.9 MiB |
| TX | |
|---|---|
| NY | 83501 |
| PA | 79847 |
| CA | 56360 |
| OH | 46480 |
| Other values (46) |
Length
| Max length | 2 |
|---|---|
| Median length | 2 |
| Mean length | 2 |
| Min length | 2 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | NC |
|---|---|
| 2nd row | WA |
| 3rd row | ID |
| 4th row | MT |
| 5th row | VA |
Common Values
| Value | Count | Frequency (%) |
| TX | 94876 | 7.3% |
| NY | 83501 | 6.4% |
| PA | 79847 | 6.2% |
| CA | 56360 | 4.3% |
| OH | 46480 | 3.6% |
| MI | 46154 | 3.6% |
| IL | 43252 | 3.3% |
| FL | 42671 | 3.3% |
| AL | 40989 | 3.2% |
| MO | 38403 | 3.0% |
| Other values (41) | 724142 |
Length
| Value | Count | Frequency (%) |
| tx | 94876 | 7.3% |
| ny | 83501 | 6.4% |
| pa | 79847 | 6.2% |
| ca | 56360 | 4.3% |
| oh | 46480 | 3.6% |
| mi | 46154 | 3.6% |
| il | 43252 | 3.3% |
| fl | 42671 | 3.3% |
| al | 40989 | 3.2% |
| mo | 38403 | 3.0% |
| Other values (41) | 724142 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 970 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 48800.6711 |
| Minimum | 1257 |
|---|---|
| Maximum | 99783 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.9 MiB |
Quantile statistics
| Minimum | 1257 |
|---|---|
| 5-th percentile | 7208 |
| Q1 | 26237 |
| median | 48174 |
| Q3 | 72042 |
| 95-th percentile | 94569 |
| Maximum | 99783 |
| Range | 98526 |
| Interquartile range (IQR) | 45805 |
Descriptive statistics
| Standard deviation | 26893.22248 |
|---|---|
| Coefficient of variation (CV) | 0.551083046 |
| Kurtosis | -1.096449332 |
| Mean | 48800.6711 |
| Median Absolute Deviation (MAD) | 23068 |
| Skewness | 0.07968075775 |
| Sum | 6.32786102 × 1010 |
| Variance | 723245415.2 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 73754 | 3646 | 0.3% |
| 34112 | 3613 | 0.3% |
| 48088 | 3597 | 0.3% |
| 82514 | 3527 | 0.3% |
| 49628 | 3123 | 0.2% |
| 15484 | 3123 | 0.2% |
| 85173 | 3119 | 0.2% |
| 29819 | 3117 | 0.2% |
| 38761 | 3113 | 0.2% |
| 5461 | 3112 | 0.2% |
| Other values (960) | 1263585 |
| Value | Count | Frequency (%) |
| 1257 | 2023 | |
| 1330 | 1031 | 0.1% |
| 1535 | 515 | < 0.1% |
| 1545 | 1024 | 0.1% |
| 1612 | 519 | < 0.1% |
| 1843 | 2597 | |
| 1844 | 2058 | |
| 2180 | 519 | < 0.1% |
| 2630 | 2090 | |
| 2908 | 550 | < 0.1% |
| Value | Count | Frequency (%) |
| 99783 | 1568 | |
| 99747 | 12 | < 0.1% |
| 99746 | 540 | < 0.1% |
| 99323 | 2572 | |
| 99160 | 3030 | |
| 99116 | 15 | < 0.1% |
| 99113 | 1047 | 0.1% |
| 99033 | 2458 | |
| 98836 | 524 | < 0.1% |
| 98665 | 500 | < 0.1% |
| Distinct | 968 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 38.53762161 |
| Minimum | 20.0271 |
|---|---|
| Maximum | 66.6933 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.9 MiB |
Quantile statistics
| Minimum | 20.0271 |
|---|---|
| 5-th percentile | 29.8826 |
| Q1 | 34.6205 |
| median | 39.3543 |
| Q3 | 41.9404 |
| 95-th percentile | 45.8433 |
| Maximum | 66.6933 |
| Range | 46.6662 |
| Interquartile range (IQR) | 7.3199 |
Descriptive statistics
| Standard deviation | 5.075808439 |
|---|---|
| Coefficient of variation (CV) | 0.1317104748 |
| Kurtosis | 0.8129679455 |
| Mean | 38.53762161 |
| Median Absolute Deviation (MAD) | 3.3597 |
| Skewness | -0.1860276801 |
| Sum | 49970770.51 |
| Variance | 25.76383131 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 36.385 | 3646 | 0.3% |
| 26.1184 | 3613 | 0.3% |
| 42.5164 | 3597 | 0.3% |
| 43.0048 | 3527 | 0.3% |
| 39.8936 | 3123 | 0.2% |
| 44.5995 | 3123 | 0.2% |
| 33.2887 | 3119 | 0.2% |
| 34.0326 | 3117 | 0.2% |
| 33.4783 | 3113 | 0.2% |
| 44.3346 | 3112 | 0.2% |
| Other values (958) | 1263585 |
| Value | Count | Frequency (%) |
| 20.0271 | 1527 | |
| 20.0827 | 1032 | 0.1% |
| 24.6557 | 2584 | |
| 26.1184 | 3613 | |
| 26.3304 | 542 | < 0.1% |
| 26.3771 | 518 | < 0.1% |
| 26.4215 | 3038 | |
| 26.4722 | 2524 | |
| 26.529 | 1549 | |
| 26.6939 | 1027 | 0.1% |
| Value | Count | Frequency (%) |
| 66.6933 | 12 | < 0.1% |
| 65.6899 | 540 | < 0.1% |
| 64.7556 | 1568 | |
| 48.8878 | 3030 | |
| 48.8856 | 2066 | |
| 48.8328 | 1533 | |
| 48.6669 | 1047 | 0.1% |
| 48.6031 | 2973 | |
| 48.4786 | 2038 | |
| 48.34 | 3088 |
| Distinct | 969 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -90.22633538 |
| Minimum | -165.6723 |
|---|---|
| Maximum | -67.9503 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 1296675 |
| Negative (%) | 100.0% |
| Memory size | 9.9 MiB |
Quantile statistics
| Minimum | -165.6723 |
|---|---|
| 5-th percentile | -119.0825 |
| Q1 | -96.798 |
| median | -87.4769 |
| Q3 | -80.158 |
| 95-th percentile | -73.5112 |
| Maximum | -67.9503 |
| Range | 97.722 |
| Interquartile range (IQR) | 16.64 |
Descriptive statistics
| Standard deviation | 13.75907695 |
|---|---|
| Coefficient of variation (CV) | -0.1524951323 |
| Kurtosis | 1.855892285 |
| Mean | -90.22633538 |
| Median Absolute Deviation (MAD) | 8.1527 |
| Skewness | -1.150107737 |
| Sum | -116994233.4 |
| Variance | 189.3121984 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| -98.0727 | 3646 | 0.3% |
| -81.7361 | 3613 | 0.3% |
| -82.9832 | 3597 | 0.3% |
| -108.8964 | 3527 | 0.3% |
| -79.7856 | 3123 | 0.2% |
| -86.2141 | 3123 | 0.2% |
| -111.0985 | 3119 | 0.2% |
| -82.2027 | 3117 | 0.2% |
| -90.5142 | 3113 | 0.2% |
| -73.098 | 3112 | 0.2% |
| Other values (959) | 1263585 |
| Value | Count | Frequency (%) |
| -165.6723 | 1568 | |
| -156.292 | 540 | < 0.1% |
| -155.488 | 1032 | |
| -155.3697 | 1527 | |
| -153.994 | 12 | < 0.1% |
| -124.4409 | 1043 | |
| -124.2174 | 1547 | |
| -124.1587 | 1031 | |
| -124.1437 | 1526 | |
| -123.9743 | 2036 |
| Value | Count | Frequency (%) |
| -67.9503 | 2080 | |
| -68.5565 | 1014 | 0.1% |
| -69.2675 | 519 | < 0.1% |
| -69.4828 | 2050 | |
| -69.9576 | 537 | < 0.1% |
| -69.9656 | 3107 | |
| -70.1031 | 9 | < 0.1% |
| -70.239 | 1036 | 0.1% |
| -70.3001 | 2090 | |
| -70.3457 | 1527 |
| Distinct | 879 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 88824.44056 |
| Minimum | 23 |
|---|---|
| Maximum | 2906700 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.9 MiB |
Quantile statistics
| Minimum | 23 |
|---|---|
| 5-th percentile | 139 |
| Q1 | 743 |
| median | 2456 |
| Q3 | 20328 |
| 95-th percentile | 525713 |
| Maximum | 2906700 |
| Range | 2906677 |
| Interquartile range (IQR) | 19585 |
Descriptive statistics
| Standard deviation | 301956.3607 |
|---|---|
| Coefficient of variation (CV) | 3.399473825 |
| Kurtosis | 37.6145193 |
| Mean | 88824.44056 |
| Median Absolute Deviation (MAD) | 2198 |
| Skewness | 5.593853067 |
| Sum | 1.151764315 × 1011 |
| Variance | 9.117764376 × 1010 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 606 | 5496 | 0.4% |
| 1595797 | 5130 | 0.4% |
| 1312922 | 5075 | 0.4% |
| 1766 | 4574 | 0.4% |
| 241 | 4533 | 0.3% |
| 2906700 | 4168 | 0.3% |
| 276002 | 4155 | 0.3% |
| 302 | 4147 | 0.3% |
| 910148 | 4073 | 0.3% |
| 198 | 4067 | 0.3% |
| Other values (869) | 1251257 |
| Value | Count | Frequency (%) |
| 23 | 2049 | |
| 37 | 1013 | 0.1% |
| 43 | 2034 | |
| 46 | 3040 | |
| 47 | 511 | < 0.1% |
| 49 | 1054 | 0.1% |
| 51 | 1016 | 0.1% |
| 52 | 518 | < 0.1% |
| 53 | 2610 | |
| 60 | 1045 | 0.1% |
| Value | Count | Frequency (%) |
| 2906700 | 4168 | |
| 2504700 | 2033 | 0.2% |
| 2383912 | 521 | < 0.1% |
| 1595797 | 5130 | |
| 1577385 | 2563 | |
| 1526206 | 3517 | |
| 1417793 | 8 | < 0.1% |
| 1382480 | 2056 | |
| 1312922 | 5075 | |
| 1263321 | 3629 |
| Distinct | 494 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.9 MiB |
| Film/video editor | 9779 |
|---|---|
| Exhibition designer | 9199 |
| Naval architect | 8684 |
| Surveyor, land/geomatics | 8680 |
| Materials engineer | 8270 |
| Other values (489) |
Length
| Max length | 59 |
|---|---|
| Median length | 19 |
| Mean length | 20.2271024 |
| Min length | 3 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Psychologist, counselling |
|---|---|
| 2nd row | Special educational needs teacher |
| 3rd row | Nature conservation officer |
| 4th row | Patent attorney |
| 5th row | Dance movement psychotherapist |
Common Values
| Value | Count | Frequency (%) |
| Film/video editor | 9779 | 0.8% |
| Exhibition designer | 9199 | 0.7% |
| Naval architect | 8684 | 0.7% |
| Surveyor, land/geomatics | 8680 | 0.7% |
| Materials engineer | 8270 | 0.6% |
| Designer, ceramics/pottery | 8225 | 0.6% |
| Systems developer | 7700 | 0.6% |
| IT trainer | 7679 | 0.6% |
| Financial adviser | 7659 | 0.6% |
| Environmental consultant | 7547 | 0.6% |
| Other values (484) | 1213253 |
Length
| Value | Count | Frequency (%) |
| engineer | 131756 | 4.6% |
| officer | 110915 | 3.9% |
| manager | 61124 | 2.1% |
| scientist | 55878 | 1.9% |
| designer | 52218 | 1.8% |
| surveyor | 49062 | 1.7% |
| teacher | 38126 | 1.3% |
| psychologist | 32600 | 1.1% |
| research | 29754 | 1.0% |
| editor | 28725 | 1.0% |
| Other values (456) | 2289024 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 1296675 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.9 MiB |
| 0b242abb623afc578575680df30655b9 | 1 |
|---|---|
| c85864e7e7cf0be6d1b8597977b8afea | 1 |
| 1a8a2a05638a5503cc6bb8d5735efcc1 | 1 |
| 4556eaf1f7def06eb500325cde4d054e | 1 |
| 5e915d9f88bd09cee9655a470d9bc0bd | 1 |
| Other values (1296670) |
Length
| Max length | 32 |
|---|---|
| Median length | 32 |
| Mean length | 32 |
| Min length | 32 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 1296675 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | 0b242abb623afc578575680df30655b9 |
|---|---|
| 2nd row | 1f76529f8574734946361c461b024d99 |
| 3rd row | a1a22d70485983eac12b5b88dad1cf95 |
| 4th row | 6b849c168bdad6f867558c3793159a81 |
| 5th row | a41d7549acf90789359a9aa5346dcb46 |
Common Values
| Value | Count | Frequency (%) |
| 0b242abb623afc578575680df30655b9 | 1 | < 0.1% |
| c85864e7e7cf0be6d1b8597977b8afea | 1 | < 0.1% |
| 1a8a2a05638a5503cc6bb8d5735efcc1 | 1 | < 0.1% |
| 4556eaf1f7def06eb500325cde4d054e | 1 | < 0.1% |
| 5e915d9f88bd09cee9655a470d9bc0bd | 1 | < 0.1% |
| 4e0080ea32b67dc251ea824d55ba1f6f | 1 | < 0.1% |
| 541a9a3880dae40c9e7778117adbc89f | 1 | < 0.1% |
| 2c602fbe0404b65cc431b059ed167518 | 1 | < 0.1% |
| 6f9d22d80c0c48e238ecc484d1c64a49 | 1 | < 0.1% |
| c766663cba6e1a1df3623e4f9d6472de | 1 | < 0.1% |
| Other values (1296665) | 1296665 |
Length
| Value | Count | Frequency (%) |
| 0b242abb623afc578575680df30655b9 | 1 | < 0.1% |
| c1d9a7ddb1e34639fe82758de97f4abf | 1 | < 0.1% |
| 189a841a0a8ba03058526bcfe566aab5 | 1 | < 0.1% |
| 83ec1cc84142af6e2acf10c44949e720 | 1 | < 0.1% |
| 6d294ed2cc447d2c71c7171a3d54967c | 1 | < 0.1% |
| fc28024ce480f8ef21a32d64c93a29f5 | 1 | < 0.1% |
| 7bb25a43205191eb7344282b88fc54d3 | 1 | < 0.1% |
| 3b9014ea8fb80bd65de0b1463b00b00e | 1 | < 0.1% |
| 3c74776e558f1499a7824b556e474b1d | 1 | < 0.1% |
| 413636e759663f264aae1819a4d4f231 | 1 | < 0.1% |
| Other values (1296665) | 1296665 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 1247805 |
|---|---|
| Distinct (%) | 96.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 38.53733804 |
| Minimum | 19.027785 |
|---|---|
| Maximum | 67.510267 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.9 MiB |
Quantile statistics
| Minimum | 19.027785 |
|---|---|
| 5-th percentile | 29.7516534 |
| Q1 | 34.733572 |
| median | 39.36568 |
| Q3 | 41.957164 |
| 95-th percentile | 46.0035301 |
| Maximum | 67.510267 |
| Range | 48.482482 |
| Interquartile range (IQR) | 7.223592 |
Descriptive statistics
| Standard deviation | 5.10978837 |
|---|---|
| Coefficient of variation (CV) | 0.1325931844 |
| Kurtosis | 0.79599391 |
| Mean | 38.53733804 |
| Median Absolute Deviation (MAD) | 3.397536 |
| Skewness | -0.1819154297 |
| Sum | 49970402.81 |
| Variance | 26.10993718 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 41.305966 | 4 | < 0.1% |
| 41.937796 | 4 | < 0.1% |
| 42.265012 | 4 | < 0.1% |
| 41.301611 | 4 | < 0.1% |
| 34.134994 | 4 | < 0.1% |
| 37.669788 | 4 | < 0.1% |
| 39.348185 | 4 | < 0.1% |
| 32.64469 | 4 | < 0.1% |
| 42.749184 | 4 | < 0.1% |
| 38.050673 | 4 | < 0.1% |
| Other values (1247795) | 1296635 |
| Value | Count | Frequency (%) |
| 19.027785 | 1 | |
| 19.027804 | 1 | |
| 19.029798 | 1 | |
| 19.031242 | 1 | |
| 19.032277 | 1 | |
| 19.033288 | 1 | |
| 19.034282 | 1 | |
| 19.034687 | 1 | |
| 19.035472 | 1 | |
| 19.036312 | 1 |
| Value | Count | Frequency (%) |
| 67.510267 | 1 | |
| 67.441518 | 1 | |
| 67.397018 | 1 | |
| 67.188111 | 1 | |
| 67.064277 | 1 | |
| 66.835174 | 1 | |
| 66.682905 | 1 | |
| 66.67355 | 1 | |
| 66.664673 | 1 | |
| 66.659242 | 1 |
| Distinct | 1275745 |
|---|---|
| Distinct (%) | 98.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -90.2264648 |
| Minimum | -166.671242 |
|---|---|
| Maximum | -66.950902 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 1296675 |
| Negative (%) | 100.0% |
| Memory size | 9.9 MiB |
Quantile statistics
| Minimum | -166.671242 |
|---|---|
| 5-th percentile | -119.3300916 |
| Q1 | -96.8972755 |
| median | -87.438392 |
| Q3 | -80.2367965 |
| 95-th percentile | -73.3542179 |
| Maximum | -66.950902 |
| Range | 99.72034 |
| Interquartile range (IQR) | 16.660479 |
Descriptive statistics
| Standard deviation | 13.77109056 |
|---|---|
| Coefficient of variation (CV) | -0.1526280631 |
| Kurtosis | 1.848479176 |
| Mean | -90.2264648 |
| Median Absolute Deviation (MAD) | 8.227889 |
| Skewness | -1.146959945 |
| Sum | -116994401.2 |
| Variance | 189.6429353 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| -87.116414 | 4 | < 0.1% |
| -81.219189 | 4 | < 0.1% |
| -74.618269 | 4 | < 0.1% |
| -85.326323 | 3 | < 0.1% |
| -84.890305 | 3 | < 0.1% |
| -88.49309 | 3 | < 0.1% |
| -84.100102 | 3 | < 0.1% |
| -97.527227 | 3 | < 0.1% |
| -85.3444 | 3 | < 0.1% |
| -86.037494 | 3 | < 0.1% |
| Other values (1275735) | 1296642 |
| Value | Count | Frequency (%) |
| -166.671242 | 1 | |
| -166.670132 | 1 | |
| -166.669638 | 1 | |
| -166.666179 | 1 | |
| -166.664828 | 1 | |
| -166.662888 | 1 | |
| -166.661968 | 1 | |
| -166.659277 | 1 | |
| -166.657834 | 1 | |
| -166.657174 | 1 |
| Value | Count | Frequency (%) |
| -66.950902 | 1 | |
| -66.955996 | 1 | |
| -66.95654 | 1 | |
| -66.958659 | 1 | |
| -66.958751 | 1 | |
| -66.959178 | 1 | |
| -66.961923 | 1 | |
| -66.962913 | 1 | |
| -66.963918 | 1 | |
| -66.963975 | 1 |
is_fraud
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.9 MiB |
| 0 | |
|---|---|
| 1 | 7506 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 1289169 | |
| 1 | 7506 | 0.6% |
Length
Pie chart
| Value | Count | Frequency (%) |
| 0 | 1289169 | |
| 1 | 7506 | 0.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 973 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.9 MiB |
| Scott Martin | 4618 |
|---|---|
| Jeffrey Smith | 3592 |
| Barbara Taylor | 3123 |
| Monica Cohen | 3123 |
| Jessica Perez | 3119 |
| Other values (968) |
Length
| Max length | 21 |
|---|---|
| Median length | 13 |
| Mean length | 13.19160931 |
| Min length | 8 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Jennifer Banks |
|---|---|
| 2nd row | Stephanie Gill |
| 3rd row | Edward Sanchez |
| 4th row | Jeremy White |
| 5th row | Tyler Garcia |
Common Values
| Value | Count | Frequency (%) |
| Scott Martin | 4618 | 0.4% |
| Jeffrey Smith | 3592 | 0.3% |
| Barbara Taylor | 3123 | 0.2% |
| Monica Cohen | 3123 | 0.2% |
| Jessica Perez | 3119 | 0.2% |
| Ana Howell | 3117 | 0.2% |
| Keith Sanders | 3113 | 0.2% |
| Christine Harris | 3112 | 0.2% |
| Tammy Ayers | 3110 | 0.2% |
| Mark Wood | 3107 | 0.2% |
| Other values (963) | 1263541 |
Length
| Value | Count | Frequency (%) |
| smith | 28794 | 1.1% |
| christopher | 26669 | 1.0% |
| williams | 23605 | 0.9% |
| james | 22073 | 0.9% |
| davis | 21910 | 0.8% |
| robert | 21667 | 0.8% |
| jessica | 20581 | 0.8% |
| johnson | 20034 | 0.8% |
| michael | 20009 | 0.8% |
| david | 19965 | 0.8% |
| Other values (796) | 2368043 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.9 MiB |
| 2019 | |
|---|---|
| 2020 |
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 4 |
| Min length | 4 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2019 |
|---|---|
| 2nd row | 2019 |
| 3rd row | 2019 |
| 4th row | 2019 |
| 5th row | 2019 |
Common Values
| Value | Count | Frequency (%) |
| 2019 | 924850 | |
| 2020 | 371825 |
Length
Pie chart
| Value | Count | Frequency (%) |
| 2019 | 924850 | |
| 2020 | 371825 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 12 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.142149729 |
| Minimum | 1 |
|---|---|
| Maximum | 12 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.9 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 6 |
| Q3 | 9 |
| 95-th percentile | 12 |
| Maximum | 12 |
| Range | 11 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 3.417703308 |
|---|---|
| Coefficient of variation (CV) | 0.5564343852 |
| Kurtosis | -1.04754632 |
| Mean | 6.142149729 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.298515751 |
| Sum | 7964372 |
| Variance | 11.6806959 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 5 | 146875 | |
| 6 | 143811 | |
| 3 | 143789 | |
| 12 | 141060 | |
| 4 | 134970 | |
| 1 | 104727 | |
| 2 | 97657 | |
| 8 | 87359 | |
| 7 | 86596 | |
| 9 | 70652 | |
| Other values (2) | 139179 |
| Value | Count | Frequency (%) |
| 1 | 104727 | |
| 2 | 97657 | |
| 3 | 143789 | |
| 4 | 134970 | |
| 5 | 146875 | |
| 6 | 143811 | |
| 7 | 86596 | |
| 8 | 87359 | |
| 9 | 70652 | |
| 10 | 68758 |
| Value | Count | Frequency (%) |
| 12 | 141060 | |
| 11 | 70421 | |
| 10 | 68758 | |
| 9 | 70652 | |
| 8 | 87359 | |
| 7 | 86596 | |
| 6 | 143811 | |
| 5 | 146875 | |
| 4 | 134970 | |
| 3 | 143789 |
| Distinct | 52 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 24.71640002 |
| Minimum | 1 |
|---|---|
| Maximum | 52 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.9 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 13 |
| median | 23 |
| Q3 | 36 |
| 95-th percentile | 50 |
| Maximum | 52 |
| Range | 51 |
| Interquartile range (IQR) | 23 |
Descriptive statistics
| Standard deviation | 14.82039325 |
|---|---|
| Coefficient of variation (CV) | 0.5996177937 |
| Kurtosis | -1.011118764 |
| Mean | 24.71640002 |
| Median Absolute Deviation (MAD) | 11 |
| Skewness | 0.3086148525 |
| Sum | 32049138 |
| Variance | 219.6440561 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 24 | 39824 | 3.1% |
| 23 | 39508 | 3.0% |
| 22 | 39465 | 3.0% |
| 25 | 37594 | 2.9% |
| 21 | 34093 | 2.6% |
| 12 | 32014 | 2.5% |
| 14 | 31874 | 2.5% |
| 10 | 31843 | 2.5% |
| 16 | 31834 | 2.5% |
| 15 | 31776 | 2.5% |
| Other values (42) | 946850 |
| Value | Count | Frequency (%) |
| 1 | 27187 | |
| 2 | 24072 | |
| 3 | 24108 | |
| 4 | 23745 | |
| 5 | 24027 | |
| 6 | 23834 | |
| 7 | 23903 | |
| 8 | 24012 | |
| 9 | 26779 | |
| 10 | 31843 |
| Value | Count | Frequency (%) |
| 52 | 31281 | |
| 51 | 31360 | |
| 50 | 31606 | |
| 49 | 31570 | |
| 48 | 21677 | |
| 47 | 15720 | |
| 46 | 15679 | |
| 45 | 15828 | |
| 44 | 15968 | |
| 43 | 15933 |
trans_day
Real number (ℝ≥0)
| Distinct | 31 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.58797848 |
| Minimum | 1 |
|---|---|
| Maximum | 31 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.9 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 8 |
| median | 15 |
| Q3 | 23 |
| 95-th percentile | 30 |
| Maximum | 31 |
| Range | 30 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 8.829121359 |
|---|---|
| Coefficient of variation (CV) | 0.5664057959 |
| Kurtosis | -1.187141658 |
| Mean | 15.58797848 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 0.03084736374 |
| Sum | 20212542 |
| Variance | 77.95338397 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 47089 | 3.6% |
| 15 | 46213 | 3.6% |
| 8 | 46201 | 3.6% |
| 16 | 44894 | 3.5% |
| 2 | 44748 | 3.5% |
| 9 | 44685 | 3.4% |
| 7 | 44239 | 3.4% |
| 14 | 44015 | 3.4% |
| 28 | 43470 | 3.4% |
| 17 | 42272 | 3.3% |
| Other values (21) | 848849 |
| Value | Count | Frequency (%) |
| 1 | 47089 | |
| 2 | 44748 | |
| 3 | 41842 | |
| 4 | 41479 | |
| 5 | 41886 | |
| 6 | 41420 | |
| 7 | 44239 | |
| 8 | 46201 | |
| 9 | 44685 | |
| 10 | 41934 |
| Value | Count | Frequency (%) |
| 31 | 24701 | |
| 30 | 41019 | |
| 29 | 39617 | |
| 28 | 43470 | |
| 27 | 39684 | |
| 26 | 40692 | |
| 25 | 40374 | |
| 24 | 41360 | |
| 23 | 40815 | |
| 22 | 42061 |
| Distinct | 24 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.80485781 |
| Minimum | 0 |
|---|---|
| Maximum | 23 |
| Zeros | 42502 |
| Zeros (%) | 3.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 7 |
| median | 14 |
| Q3 | 19 |
| 95-th percentile | 23 |
| Maximum | 23 |
| Range | 23 |
| Interquartile range (IQR) | 12 |
Descriptive statistics
| Standard deviation | 6.817823899 |
|---|---|
| Coefficient of variation (CV) | 0.5324404223 |
| Kurtosis | -1.079580292 |
| Mean | 12.80485781 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | -0.2828254537 |
| Sum | 16603739 |
| Variance | 46.48272272 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 23 | 67104 | 5.2% |
| 22 | 66982 | 5.2% |
| 18 | 66051 | 5.1% |
| 16 | 65726 | 5.1% |
| 21 | 65533 | 5.1% |
| 19 | 65508 | 5.1% |
| 17 | 65450 | 5.0% |
| 15 | 65391 | 5.0% |
| 13 | 65314 | 5.0% |
| 12 | 65257 | 5.0% |
| Other values (14) | 638359 |
| Value | Count | Frequency (%) |
| 0 | 42502 | |
| 1 | 42869 | |
| 2 | 42656 | |
| 3 | 42769 | |
| 4 | 41863 | |
| 5 | 42171 | |
| 6 | 42300 | |
| 7 | 42203 | |
| 8 | 42505 | |
| 9 | 42185 |
| Value | Count | Frequency (%) |
| 23 | 67104 | |
| 22 | 66982 | |
| 21 | 65533 | |
| 20 | 65098 | |
| 19 | 65508 | |
| 18 | 66051 | |
| 17 | 65450 | |
| 16 | 65726 | |
| 15 | 65391 | |
| 14 | 64885 |
| Distinct | 60 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 29.49528525 |
| Minimum | 0 |
|---|---|
| Maximum | 59 |
| Zeros | 21372 |
| Zeros (%) | 1.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 14 |
| median | 30 |
| Q3 | 44 |
| 95-th percentile | 57 |
| Maximum | 59 |
| Range | 59 |
| Interquartile range (IQR) | 30 |
Descriptive statistics
| Standard deviation | 17.32017992 |
|---|---|
| Coefficient of variation (CV) | 0.5872185935 |
| Kurtosis | -1.200804645 |
| Mean | 29.49528525 |
| Median Absolute Deviation (MAD) | 15 |
| Skewness | -0.000393712934 |
| Sum | 38245799 |
| Variance | 299.9886324 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 40 | 21918 | 1.7% |
| 1 | 21867 | 1.7% |
| 7 | 21827 | 1.7% |
| 59 | 21803 | 1.7% |
| 3 | 21797 | 1.7% |
| 14 | 21783 | 1.7% |
| 36 | 21779 | 1.7% |
| 51 | 21777 | 1.7% |
| 27 | 21763 | 1.7% |
| 4 | 21748 | 1.7% |
| Other values (50) | 1078613 |
| Value | Count | Frequency (%) |
| 0 | 21372 | |
| 1 | 21867 | |
| 2 | 21718 | |
| 3 | 21797 | |
| 4 | 21748 | |
| 5 | 21505 | |
| 6 | 21490 | |
| 7 | 21827 | |
| 8 | 21586 | |
| 9 | 21512 |
| Value | Count | Frequency (%) |
| 59 | 21803 | |
| 58 | 21511 | |
| 57 | 21651 | |
| 56 | 21496 | |
| 55 | 21474 | |
| 54 | 21525 | |
| 53 | 21530 | |
| 52 | 21560 | |
| 51 | 21777 | |
| 50 | 21612 |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.070603659 |
| Minimum | 0 |
|---|---|
| Maximum | 6 |
| Zeros | 254282 |
| Zeros (%) | 19.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 3 |
| Q3 | 5 |
| 95-th percentile | 6 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 2.198152555 |
|---|---|
| Coefficient of variation (CV) | 0.7158698414 |
| Kurtosis | -1.445048986 |
| Mean | 3.070603659 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.07845304063 |
| Sum | 3981575 |
| Variance | 4.831874654 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 254282 | |
| 6 | 250579 | |
| 5 | 200957 | |
| 1 | 160227 | |
| 4 | 152272 | |
| 3 | 147285 | |
| 2 | 131073 |
| Value | Count | Frequency (%) |
| 0 | 254282 | |
| 1 | 160227 | |
| 2 | 131073 | |
| 3 | 147285 | |
| 4 | 152272 | |
| 5 | 200957 | |
| 6 | 250579 |
| Value | Count | Frequency (%) |
| 6 | 250579 | |
| 5 | 200957 | |
| 4 | 152272 | |
| 3 | 147285 | |
| 2 | 131073 | |
| 1 | 160227 | |
| 0 | 254282 |
| Distinct | 83 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 46.02929801 |
| Minimum | 14 |
|---|---|
| Maximum | 96 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.9 MiB |
Quantile statistics
| Minimum | 14 |
|---|---|
| 5-th percentile | 22 |
| Q1 | 33 |
| median | 44 |
| Q3 | 57 |
| 95-th percentile | 80 |
| Maximum | 96 |
| Range | 82 |
| Interquartile range (IQR) | 24 |
Descriptive statistics
| Standard deviation | 17.38237262 |
|---|---|
| Coefficient of variation (CV) | 0.3776371436 |
| Kurtosis | -0.1760038548 |
| Mean | 46.02929801 |
| Median Absolute Deviation (MAD) | 12 |
| Skewness | 0.6122620439 |
| Sum | 59685040 |
| Variance | 302.1468781 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 47 | 41337 | 3.2% |
| 35 | 39331 | 3.0% |
| 34 | 35816 | 2.8% |
| 32 | 35588 | 2.7% |
| 33 | 33430 | 2.6% |
| 45 | 33098 | 2.6% |
| 48 | 32719 | 2.5% |
| 46 | 32212 | 2.5% |
| 44 | 31035 | 2.4% |
| 43 | 30528 | 2.4% |
| Other values (73) | 951581 |
| Value | Count | Frequency (%) |
| 14 | 1318 | 0.1% |
| 15 | 5817 | 0.4% |
| 16 | 5104 | 0.4% |
| 17 | 1191 | 0.1% |
| 18 | 3901 | 0.3% |
| 19 | 8203 | 0.6% |
| 20 | 16326 | |
| 21 | 14915 | |
| 22 | 24536 | |
| 23 | 13209 |
| Value | Count | Frequency (%) |
| 96 | 138 | < 0.1% |
| 95 | 398 | < 0.1% |
| 94 | 1722 | 0.1% |
| 93 | 5684 | |
| 92 | 4450 | |
| 91 | 4824 | |
| 90 | 5443 | |
| 89 | 3916 | |
| 88 | 3843 | |
| 87 | 2364 |
| Distinct | 1296675 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 76.11247932 |
| Minimum | 0.02227351335 |
|---|---|
| Maximum | 151.8682002 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.9 MiB |
Quantile statistics
| Minimum | 0.02227351335 |
|---|---|
| 5-th percentile | 24.74989046 |
| Q1 | 55.357839 |
| median | 78.26335343 |
| Q3 | 98.46834751 |
| 95-th percentile | 120.4487172 |
| Maximum | 151.8682002 |
| Range | 151.8459267 |
| Interquartile range (IQR) | 43.11050851 |
Descriptive statistics
| Standard deviation | 29.0926998 |
|---|---|
| Coefficient of variation (CV) | 0.3822329802 |
| Kurtosis | -0.6310697847 |
| Mean | 76.11247932 |
| Median Absolute Deviation (MAD) | 21.43870913 |
| Skewness | -0.2382690705 |
| Sum | 98693149.12 |
| Variance | 846.3851815 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 78.77382075 | 1 | < 0.1% |
| 85.45067088 | 1 | < 0.1% |
| 73.47625337 | 1 | < 0.1% |
| 130.2186198 | 1 | < 0.1% |
| 76.57205232 | 1 | < 0.1% |
| 82.91820881 | 1 | < 0.1% |
| 90.01945001 | 1 | < 0.1% |
| 85.98537944 | 1 | < 0.1% |
| 77.8908601 | 1 | < 0.1% |
| 59.78779235 | 1 | < 0.1% |
| Other values (1296665) | 1296665 |
| Value | Count | Frequency (%) |
| 0.02227351335 | 1 | |
| 0.06673123416 | 1 | |
| 0.09405772594 | 1 | |
| 0.1133855774 | 1 | |
| 0.1371995071 | 1 | |
| 0.1538761904 | 1 | |
| 0.2004959165 | 1 | |
| 0.2030157483 | 1 | |
| 0.2218462287 | 1 | |
| 0.2511567388 | 1 |
| Value | Count | Frequency (%) |
| 151.8682002 | 1 | |
| 150.5801916 | 1 | |
| 149.6101271 | 1 | |
| 149.2055714 | 1 | |
| 148.6236717 | 1 | |
| 148.5283365 | 1 | |
| 148.4270844 | 1 | |
| 148.0349077 | 1 | |
| 147.9646726 | 1 | |
| 147.9550441 | 1 |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.2 MiB |
| above_medium | |
|---|---|
| medium | |
| low | |
| high | |
| very_low |
Length
| Max length | 12 |
|---|---|
| Median length | 6 |
| Mean length | 7.250098907 |
| Min length | 3 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | low |
|---|---|
| 2nd row | high |
| 3rd row | very_high |
| 4th row | medium |
| 5th row | medium |
Common Values
| Value | Count | Frequency (%) |
| above_medium | 324151 | |
| medium | 324087 | |
| low | 194530 | |
| high | 194454 | |
| very_low | 129795 | |
| very_high | 129658 | 10.0% |
Length
Pie chart
| Value | Count | Frequency (%) |
| above_medium | 324151 | |
| medium | 324087 | |
| low | 194530 | |
| high | 194454 | |
| very_low | 129795 | |
| very_high | 129658 | 10.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 9 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1318 |
| Missing (%) | 0.1% |
| Memory size | 1.2 MiB |
| 24_34 | |
|---|---|
| 34_44 | |
| 44_54 | |
| 54_64 | |
| below_24 | |
| Other values (4) |
Length
| Max length | 8 |
|---|---|
| Median length | 5 |
| Mean length | 5.255077944 |
| Min length | 5 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 24_34 |
|---|---|
| 2nd row | 34_44 |
| 3rd row | 54_64 |
| 4th row | 44_54 |
| 5th row | 24_34 |
Common Values
| Value | Count | Frequency (%) |
| 24_34 | 278361 | |
| 34_44 | 271553 | |
| 44_54 | 269899 | |
| 54_64 | 163629 | |
| below_24 | 109603 | 8.5% |
| 64_74 | 104141 | 8.0% |
| 74_84 | 57671 | 4.4% |
| 84_94 | 39964 | 3.1% |
| above_94 | 536 | < 0.1% |
| (Missing) | 1318 | 0.1% |
Length
Pie chart
| Value | Count | Frequency (%) |
| 24_34 | 278361 | |
| 34_44 | 271553 | |
| 44_54 | 269899 | |
| 54_64 | 163629 | |
| below_24 | 109603 | 8.5% |
| 64_74 | 104141 | 8.0% |
| 74_84 | 57671 | 4.5% |
| 84_94 | 39964 | 3.1% |
| above_94 | 536 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| trans_datetime | cc_num | merchant | category | amt | gender | street | city | state | zip | lat | long | city_pop | job | trans_num | merch_lat | merch_long | is_fraud | name | coords_ori | coords_merch | trans_year | trans_month | trans_week | trans_day | trans_hour | trans_minute | trans_dayofweek | age | distance | amt_group | age_group | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2019-01-01 00:00:18 | 2703186189652095 | fraud_Rippin,_Kub_and_Mann | misc_net | 4.97 | F | 561 Perry Cove | Moravian_Falls | NC | 28654 | 36.0788 | -81.1781 | 3495 | Psychologist, counselling | 0b242abb623afc578575680df30655b9 | 36.011293 | -82.048315 | 0 | Jennifer Banks | (36.0788, -81.1781) | (36.011293, -82.048315) | 2019 | 1 | 1 | 1 | 0 | 0 | 1 | 31 | 78.773821 | low | 24_34 |
| 1 | 2019-01-01 00:00:44 | 630423337322 | fraud_Heller,_Gutmann_and_Zieme | grocery_pos | 107.23 | F | 43039 Riley Greens Suite 393 | Orient | WA | 99160 | 48.8878 | -118.2105 | 149 | Special educational needs teacher | 1f76529f8574734946361c461b024d99 | 49.159047 | -118.186462 | 0 | Stephanie Gill | (48.8878, -118.2105) | (49.159047, -118.186462) | 2019 | 1 | 1 | 1 | 0 | 0 | 1 | 41 | 30.216618 | high | 34_44 |
| 2 | 2019-01-01 00:00:51 | 38859492057661 | fraud_Lind-Buckridge | entertainment | 220.11 | M | 594 White Dale Suite 530 | Malad_City | ID | 83252 | 42.1808 | -112.2620 | 4154 | Nature conservation officer | a1a22d70485983eac12b5b88dad1cf95 | 43.150704 | -112.154481 | 0 | Edward Sanchez | (42.1808, -112.262) | (43.150704, -112.154481) | 2019 | 1 | 1 | 1 | 0 | 0 | 1 | 57 | 108.102912 | very_high | 54_64 |
| 3 | 2019-01-01 00:01:16 | 3534093764340240 | fraud_Kutch,_Hermiston_and_Farrell | gas_transport | 45.00 | M | 9443 Cynthia Court Apt. 038 | Boulder | MT | 59632 | 46.2306 | -112.1138 | 1939 | Patent attorney | 6b849c168bdad6f867558c3793159a81 | 47.034331 | -112.561071 | 0 | Jeremy White | (46.2306, -112.1138) | (47.034331, -112.561071) | 2019 | 1 | 1 | 1 | 0 | 1 | 1 | 52 | 95.685115 | medium | 44_54 |
| 4 | 2019-01-01 00:03:06 | 375534208663984 | fraud_Keeling-Crist | misc_pos | 41.96 | M | 408 Bradley Rest | Doe_Hill | VA | 24433 | 38.4207 | -79.4629 | 99 | Dance movement psychotherapist | a41d7549acf90789359a9aa5346dcb46 | 38.674999 | -78.632459 | 0 | Tyler Garcia | (38.4207, -79.4629) | (38.674999, -78.632459) | 2019 | 1 | 1 | 1 | 0 | 3 | 1 | 33 | 77.702395 | medium | 24_34 |
| 5 | 2019-01-01 00:04:08 | 4767265376804500 | fraud_Stroman,_Hudson_and_Erdman | gas_transport | 94.63 | F | 4655 David Island | Dublin | PA | 18917 | 40.3750 | -75.2045 | 2158 | Transport planner | 189a841a0a8ba03058526bcfe566aab5 | 40.653382 | -76.152667 | 0 | Jennifer Conner | (40.375, -75.2045) | (40.653382, -76.15266700000001) | 2019 | 1 | 1 | 1 | 0 | 4 | 1 | 58 | 86.097358 | high | 54_64 |
| 6 | 2019-01-01 00:04:42 | 30074693890476 | fraud_Rowe-Vandervort | grocery_net | 44.54 | F | 889 Sarah Station Suite 624 | Holcomb | KS | 67851 | 37.9931 | -100.9893 | 2691 | Arboriculturist | 83ec1cc84142af6e2acf10c44949e720 | 37.162705 | -100.153370 | 0 | Kelsey Richards | (37.9931, -100.9893) | (37.162705, -100.15337) | 2019 | 1 | 1 | 1 | 0 | 4 | 1 | 26 | 118.094855 | medium | 24_34 |
| 7 | 2019-01-01 00:05:08 | 6011360759745864 | fraud_Corwin-Collins | gas_transport | 71.65 | M | 231 Flores Pass Suite 720 | Edinburg | VA | 22824 | 38.8432 | -78.6003 | 6018 | Designer, multimedia | 6d294ed2cc447d2c71c7171a3d54967c | 38.948089 | -78.540296 | 0 | Steven Williams | (38.8432, -78.6003) | (38.948089, -78.540296) | 2019 | 1 | 1 | 1 | 0 | 5 | 1 | 72 | 12.754714 | above_medium | 64_74 |
| 8 | 2019-01-01 00:05:18 | 4922710831011201 | fraud_Herzog_Ltd | misc_pos | 4.27 | F | 6888 Hicks Stream Suite 954 | Manor | PA | 15665 | 40.3359 | -79.6607 | 1472 | Public affairs consultant | fc28024ce480f8ef21a32d64c93a29f5 | 40.351813 | -79.958146 | 0 | Heather Chase | (40.3359, -79.6607) | (40.351813, -79.958146) | 2019 | 1 | 1 | 1 | 0 | 5 | 1 | 78 | 25.333883 | low | 74_84 |
| 9 | 2019-01-01 00:06:01 | 2720830304681674 | fraud_Schoen,_Kuphal_and_Nitzsche | grocery_pos | 198.39 | F | 21326 Taylor Squares Suite 708 | Clarksville | TN | 37040 | 36.5220 | -87.3490 | 151785 | Pathologist | 3b9014ea8fb80bd65de0b1463b00b00e | 37.179198 | -87.485381 | 0 | Melissa Aguilar | (36.522, -87.34899999999999) | (37.179198, -87.485381) | 2019 | 1 | 1 | 1 | 0 | 6 | 1 | 45 | 73.939714 | very_high | 44_54 |
Last rows
| trans_datetime | cc_num | merchant | category | amt | gender | street | city | state | zip | lat | long | city_pop | job | trans_num | merch_lat | merch_long | is_fraud | name | coords_ori | coords_merch | trans_year | trans_month | trans_week | trans_day | trans_hour | trans_minute | trans_dayofweek | age | distance | amt_group | age_group | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1296665 | 2020-06-21 12:08:42 | 213193596103206 | fraud_Gulgowski_LLC | home | 72.17 | M | 7369 Gabriel Tunnel | Pointe_Aux_Pins | MI | 49775 | 45.7549 | -84.4470 | 95 | Electrical engineer | 108c103b26f686c24c021aaf4210977e | 44.938461 | -83.996234 | 0 | James Hunt | (45.7549, -84.447) | (44.938461, -83.996234) | 2020 | 6 | 25 | 21 | 12 | 8 | 6 | 26 | 97.371601 | above_medium | 24_34 |
| 1296666 | 2020-06-21 12:09:22 | 4587657402165341815 | fraud_Hyatt,_Russel_and_Gleichner | health_fitness | 7.30 | F | 6296 John Keys Suite 858 | Pembroke_Township | IL | 60958 | 41.0646 | -87.5917 | 2135 | Psychotherapist, child | 37a18c6fb0c5c722b6339ffedc82f55a | 40.556811 | -88.092339 | 0 | Amber Lewis | (41.0646, -87.5917) | (40.556811, -88.092339) | 2020 | 6 | 25 | 21 | 12 | 9 | 6 | 16 | 70.456765 | low | below_24 |
| 1296667 | 2020-06-21 12:10:56 | 4822367783500458 | fraud_Hahn,_Douglas_and_Schowalter | travel | 19.71 | M | 97070 Anderson Land | Haines_City | FL | 33844 | 28.0758 | -81.5929 | 33804 | Exercise physiologist | 34e72e0a659a6c8f4a20ee65594f3a7d | 27.465871 | -81.511804 | 0 | Christopher Farrell | (28.0758, -81.5929) | (27.465871000000003, -81.511804) | 2020 | 6 | 25 | 21 | 12 | 10 | 6 | 29 | 68.060786 | medium | 24_34 |
| 1296668 | 2020-06-21 12:11:23 | 213141712584544 | fraud_Metz,_Russel_and_Metz | kids_pets | 100.85 | F | 742 Oneill Shore | Florence | MS | 39073 | 32.1530 | -90.1217 | 19685 | Fine artist | 0d86d8c17638d7eff77db9c6a878b477 | 31.377697 | -90.528450 | 0 | Margaret Curtis | (32.153, -90.1217) | (31.377697, -90.52845) | 2020 | 6 | 25 | 21 | 12 | 11 | 6 | 36 | 94.208072 | high | 34_44 |
| 1296669 | 2020-06-21 12:11:36 | 4400011257587661852 | fraud_Stiedemann_Inc | misc_pos | 37.38 | F | 474 Allen Haven | North_Loup | NE | 68859 | 41.4972 | -98.7858 | 509 | Nurse, children's | 9a7ea2625cf8303efe34e3c09546868f | 41.728638 | -99.039660 | 0 | Marissa Powell | (41.4972, -98.7858) | (41.728638, -99.03966) | 2020 | 6 | 25 | 21 | 12 | 11 | 6 | 40 | 33.293541 | medium | 34_44 |
| 1296670 | 2020-06-21 12:12:08 | 30263540414123 | fraud_Reichel_Inc | entertainment | 15.56 | M | 162 Jessica Row Apt. 072 | Hatch | UT | 84735 | 37.7175 | -112.4777 | 258 | Geoscientist | 440b587732da4dc1a6395aba5fb41669 | 36.841266 | -111.690765 | 0 | Erik Patterson | (37.7175, -112.4777) | (36.841266, -111.69076499999998) | 2020 | 6 | 25 | 21 | 12 | 12 | 6 | 59 | 119.696415 | medium | 54_64 |
| 1296671 | 2020-06-21 12:12:19 | 6011149206456997 | fraud_Abernathy_and_Sons | food_dining | 51.70 | M | 8617 Holmes Terrace Suite 651 | Tuscarora | MD | 21790 | 39.2667 | -77.5101 | 100 | Production assistant, television | 278000d2e0d2277d1de2f890067dcc0a | 38.906881 | -78.246528 | 0 | Jeffrey White | (39.2667, -77.5101) | (38.906881, -78.246528) | 2020 | 6 | 25 | 21 | 12 | 12 | 6 | 41 | 75.202184 | above_medium | 34_44 |
| 1296672 | 2020-06-21 12:12:32 | 3514865930894695 | fraud_Stiedemann_Ltd | food_dining | 105.93 | M | 1632 Cohen Drive Suite 639 | High_Rolls_Mountain_Park | NM | 88325 | 32.9396 | -105.8189 | 899 | Naval architect | 483f52fe67fabef353d552c1e662974c | 33.619513 | -105.130529 | 0 | Christopher Castaneda | (32.9396, -105.8189) | (33.619513, -105.130529) | 2020 | 6 | 25 | 21 | 12 | 12 | 6 | 53 | 98.987927 | high | 44_54 |
| 1296673 | 2020-06-21 12:13:36 | 2720012583106919 | fraud_Reinger,_Weissnat_and_Strosin | food_dining | 74.90 | M | 42933 Ryan Underpass | Manderson | SD | 57756 | 43.3526 | -102.5411 | 1126 | Volunteer coordinator | d667cdcbadaaed3da3f4020e83591c83 | 42.788940 | -103.241160 | 0 | Joseph Murray | (43.3526, -102.5411) | (42.78894, -103.24116) | 2020 | 6 | 25 | 21 | 12 | 13 | 6 | 40 | 84.688356 | above_medium | 34_44 |
| 1296674 | 2020-06-21 12:13:37 | 4292902571056973207 | fraud_Langosh,_Wintheiser_and_Hyatt | food_dining | 4.30 | M | 135 Joseph Mountains | Sula | MT | 59871 | 45.8433 | -113.8748 | 218 | Therapist, horticultural | 8f7c8e4ab7f25875d753b422917c98c9 | 46.565983 | -114.186110 | 0 | Jeffrey Smith | (45.8433, -113.8748) | (46.565983, -114.18611) | 2020 | 6 | 25 | 21 | 12 | 13 | 6 | 25 | 83.845902 | low | 24_34 |